Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for sr-SR #40

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Adding support for sr-SR #40

wants to merge 11 commits into from

Conversation

nciric
Copy link
Contributor

@nciric nciric commented Dec 18, 2024

Starting a how to add a new language document. I'll add more steps as I progress (I would like to have working build before I go any deeper).

I added sr, sr_RS to the LangUtil to try it out. I see that Croatian is already there, but also not supported so I feel this is fine.

@nciric nciric requested a review from grhoten December 18, 2024 22:18
@grhoten
Copy link
Member

grhoten commented Dec 19, 2024

Please consider the following patch as a starting point.
add-sr.patch

Here are some caveats.

  1. It's missing the lexical dictionary. That's your biggest blocker.
  2. The script of the pronouns (Latin) does not match the script of the RBNF rules (Cyrillic). Feel free to change as appropriate, or switch to hr to match the CLDR default. If not that, then consider switching to sr_Latn_RS instead.
  3. The RBNF rules look incomplete in CLDR. So grammatical correctness of quantities is probably incorrect. See RuCommonConceptFactory on a more complete solution. You may want to consider adding some test cases with different grammatical cases in QuantifyTest#testSerbian. You may find inspiration in the QuantifyTest#testRussian test for how to write such a test.
  4. The display function needs to be implemented. See the Russian display function for a more complete implementation.
  5. If there was a lexical dictionary, then it would be possible to detect the properties for grammatical number and gender as this patch is currently written.
  6. I'm assuming that the properties in grammar.xml is complete enough. Feel free to verify.
  7. It's missing some data driven tests. I recommend adding some in morphuntion/test/resources/morphuntion/dialog/inflection/sr.xml.

@nciric
Copy link
Contributor Author

nciric commented Dec 19, 2024

Please consider the following patch as a starting point. add-sr.patch

Great patch, thanks!

Here are some caveats.

  1. It's missing the lexical dictionary. That's your biggest blocker.

I didn't notice Croatian was already in. Do we have a lexicon for it already, or at least a test sized one (I can transliterate it)?

  1. The script of the pronouns (Latin) does not match the script of the RBNF rules (Cyrillic). Feel free to change as appropriate, or switch to hr to match the CLDR default. If not that, then consider switching to sr_Latn_RS instead.

Changed to Cyrillic.

  1. The RBNF rules look incomplete in CLDR. So grammatical correctness of quantities is probably incorrect. See RuCommonConceptFactory on a more complete solution. You may want to consider adding some test cases with different grammatical cases in QuantifyTest#testSerbian. You may find inspiration in the QuantifyTest#testRussian test for how to write such a test.

I need working build to start testing. Is it ok to submit as is for now and come back to testing later?

  1. The display function needs to be implemented. See the Russian display function for a more complete implementation.
  2. If there was a lexical dictionary, then it would be possible to detect the properties for grammatical number and gender as this patch is currently written.
  3. I'm assuming that the properties in grammar.xml is complete enough. Feel free to verify.

They seem ok, as they are based on Croatian entry already there.

  1. It's missing some data driven tests. I recommend adding some in morphuntion/test/resources/morphuntion/dialog/inflection/sr.xml.

Is it ok to add them later after we figure out the build problem? But definitely on my TODO list.

@grhoten
Copy link
Member

grhoten commented Dec 19, 2024

After some thought, I think it's best to separate out the instructions from the patch to add a new language. The tests currently pass, and these changes are incomplete (primarily due to the missing lexical dictionary). I think the patch to add a new language needs further work, but it's a good start.

@nciric nciric changed the title How to add a new language doc, adding sr_SR to the list. Adding support for sr-SR Dec 19, 2024
@nciric
Copy link
Contributor Author

nciric commented Dec 20, 2024

After some thought, I think it's best to separate out the instructions from the patch to add a new language. The tests currently pass, and these changes are incomplete (primarily due to the missing lexical dictionary). I think the patch to add a new language needs further work, but it's a good start.

I moved the instruction doc to a separate PR (you are on it).

@nciric
Copy link
Contributor Author

nciric commented Jan 9, 2025

@grhoten I have two question that would speed things up:

  • Where do I implement inflection rules? Currently the tests fail with "Inflector not found for sr" message.
  • What's the process of adding a dictionary?
    • I can craft a manual one (to pass the tests) following the format for Russian inflection/resources/org/unicode/inflection/dictionary/dictionary_ru.lst
    • But how do I generate the rules xml file (inflection/resources/org/unicode/inflection/dictionary/inflectional_ru.xml) and then update the lexicon entries with inflection= value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants